A Novel Integrated Classifier for Handling Data Warehouse Anomalies

نویسندگان

  • Peter Darcy
  • Bela Stantic
  • Abdul Sattar
چکیده

Within databases employed in various commercial sectors, anomalies continue to persist and hinder the overall integrity of data. Typically, Duplicate, Wrong andMissed observations of spatial-temporal data causes the user to be not able to accurately utilise recorded information. In literature, different methods have been mentioned to clean data which fall into the category of either deterministic and probabilistic approaches. However, we believe that to ensure the maximum integrity, a data cleaning methodology must have properties of both of these categories to effectively eliminate the anomalies. To realise this, we have proposed a method which relies both on integrated deterministic and probabilistic classifiers using fusion techniques. We have empirically evaluated the proposed concept with state-of-the-art techniques and found that our approach improves the integrity of the resulting data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comprehensive Mathematical Model for the Design of a Dynamic Cellular Manufacturing System Integrated with Production Planning and Several Manufacturing Attributes

    Dynamic cellular manufacturing systems,   Mixed-integer non-linear programming,   Production planning, Manufacturing attributes   This paper presents a novel mixed-integer non-linear programming model for the design of a dynamic cellular manufacturing system (DCMS) based on production planning (PP) decisions and several manufacturing attributes. Such an integrated DCMS model with an extensi...

متن کامل

SUBCLASS FUZZY-SVM CLASSIFIER AS AN EFFICIENT METHOD TO ENHANCE THE MASS DETECTION IN MAMMOGRAMS

This paper is concerned with the development of a novel classifier for automatic mass detection of mammograms, based on contourlet feature extraction in conjunction with statistical and fuzzy classifiers. In this method, mammograms are segmented into regions of interest (ROI) in order to extract features including geometrical and contourlet coefficients. The extracted features benefit from...

متن کامل

Formal approach to modelling a multiversion data warehouse

A data warehouse (DW) is a large centralized database that stores data integrated from multiple, usually heterogeneous external data sources (EDSs). DW content is processed by so called On-Line Analytical Processing applications, that analyze business trends, discover anomalies and hidden dependencies between data. These applications are part of decision support systems. EDSs constantly change ...

متن کامل

A Novel Type-2 Adaptive Neuro Fuzzy Inference System Classifier for Modelling Uncertainty in Prediction of Air Pollution Disaster (RESEARCH NOTE)

Type-2 fuzzy set theory is one of the most powerful tools for dealing with the uncertainty and imperfection in dynamic and complex environments. The applications of type-2 fuzzy sets and soft computing methods are rapidly emerging in the ecological fields such as air pollution and weather prediction. The air pollution problem is a major public health problem in many cities of the world. Predict...

متن کامل

A Novel Ensemble Approach for Anomaly Detection in Wireless Sensor Networks Using Time-overlapped Sliding Windows

One of the most important issues concerning the sensor data in the Wireless Sensor Networks (WSNs) is the unexpected data which are acquired from the sensors. Today, there are numerous approaches for detecting anomalies in the WSNs, most of which are based on machine learning methods. In this research, we present a heuristic method based on the concept of “ensemble of classifiers” of data minin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011